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Abstract 

This paper deals with the problem of classifying signals. The new method for building so 
called local classifiers and local features is presented. The method is a combination of the 
lifting scheme and the support vector machines. Its main aim is to produce effective and 
yet comprehensible classifiers that would help in understanding processes hidden behind 
classified signals. To illustrate the method we present the results obtained on an artificial 
and a real dataset. 

Keywords: local feature, local classifier, lifting scheme, support vector machines, signal 
analysis 



m 
o 

O 

o 
o 



X 



1. Introduction 

Many classification algorithms such as artificial neural networks induce classifiers which have 
good accuracy but do not give an insight into the real process which is hidden behind the 
problem. Although predictions are made with high precision such classifiers do not answer 
the question "Why?". Even algorithms such as decision trees or rule inducers very often 
produce enormous classifiers. Their analysis is almost intractable by the human mind. It is 
even worse when these algorithms are used for problems of signal classification. In practice 
good accuracy without an explanation of the classification process is useless. 

In this article we describe an approach which can help in building classifiers which 
are not only very accurate but also comprehensible. The method is based on the idea of 
the lifting scheme I Swelden'3 . 19981 ). The lifting scheme is used for calculating expansion 
coefficients of analysed signals using biorthogonal wavelet bases. The biggest advantage 
of this method is that it uses only spatial domain in contrast to the classical approach 
ijPaubechiej . 19921 ) in which the frequency domain is used. As originally lifting scheme did 
not give us e nough freedom in incor porating adaptation we used its modified version called 



update-first l|Clavpoole et al.Lll 



Assume we act in space M spanned by a biorthogonal base {i;^>i}"=i and {(/>i}"^|. Vectors 
and {(/>j}f^]^ are biorthogonal in the sense that 



5ij 



where 5ij = 1 if i = j and otherwise. 
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Each vector x € can be expressed in the following way 

n 

x = '^ai(j)i (1) 

i=l 



where = (^(pi^xj are expansion coefficients. Very important feature of vectors {4>i\'^=i 

is that they can be nonzero only for several indices. It implies that for calculating (^(j)i,x^ 
only a part of the vector x is needed. This feature is called locality. 

The aim of the method presented in this article is to find such an expansion Q by 

implicitly constructing biorthogonal base {{4>i)4>i)}^=i^ that coefficients = (^4>i,x^ are as 
discriminative as possible for classified signals. 

More specifically we assume that a training set X = {{xi, yi) : Xj G M", yi € {—1, +1}} 
is given. For each base vector (f)j we get a vector of expansion coefficients G M} 



^(pj,Xi ^ 

For each such vector we can find a number 6^ S M called bias for which 

sgn(QJ''(i) +V)=yi 

for as many indices i G {1, 2, ...,/} as possible. 

For calcul ating expansion coefficients we used the idea of support vector machines (SVM) 
introduced by SVM proved to be one of the best classifier inducers. Com- 



bining the power of SVM and the locality feature of the designed base we were able to build 
classifiers with a very good classification accuracy and which are also easily interpreted. 
We present experiments obtained for an artificial datasets and a real dataset. The artificial 
datasets allowed us to verify our method and to better understand its features. Experiments 
conducted on the real dataset proofed usefulness of the method for real applications. 



2. Outline of the paper 

The paper is divided into two main parts and the appendix. The first part is devoted 
to a description of the method and consists of three subparts. First we present a general 
outline of the method next we introduce some notation that will be used in next part that 
gives detailed description of the method. The first part of the paper we end with a short 
summary of the presented method. In the second part of the paper we present a results of 
the experiments conducted both on the artificial and the real dataset. In the appendix we 
show how to efficiently solve optimisation problems that arise in the method. 



3. Method description 

In this section we will describe the new method for designing discriminative biorthogonal 
bases for signal classification. In fact we will be computing only expansion coefficients of 

1. More precisely, we used P SVM a variant of SVM called proximal support vector machines 
iFung and MangasariarJ . l2flfl']l l. 
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some implicitly defined discriminative biort hogonal base. The me thod is a combination 



of update-first ve rsion of the lifting scheme dClavpoole et al. , Il998l ) and proximal support 



vector machines ( Fnng a,nd Ma,nga,sa,ria,nl . 2001 



3.1 Outline of the method 

The method is based on the Lifting Scheme that is very general and easily modified method 
for computing expansion coefficients of analysed signal with respect to biorthogonal base. 
The method is iterative and each iteration is divided into three steps 

• SPLIT - Signal is splitted into two subsignals containing even and odd indices. 

• UPDATE - Coarse approximation of analysed signal is computed from subsignals. 

• PREDICT - Wavelet coefficients are calculated using coarse approximation and 
subsignal containing even indicies. Those coefficients are simply inner products be- 
tween a weight vector and small part of c oarse approximation and even subsignal. We 
used proximal support vector Machines I Fung and MangasarianL 2001 ) to calculate 
the weight vector. As PSVM is the procedure for generating classifiers we decided to 
call obtained expansion coefficients discriminative wavelet coefficients. 



Coarse approximation is used as an input for next iteration. As the coarse approximation is 
twice shorter than original signal the number of iterations is bounded from above by ln(A) 
where N is the length of the analysed signal. 



3.2 Notation 

Assume we are given a training set X 



X 



i,yi) e 



nArx{-l,+l} 



: i = 1 



....,} 



where A = 2" for some n G N. Vectors Xj are sampled versions of signals we want to 
analyse and yi £ {— 1,+1} are labels. 
Having set X we create two matrices 



A 




and 



yi 



yi 

Let / = {ii, . . . ,ik} be a set of integer numbers (indices). We will use the following short- 
hand notation for accessing indices I of a vector x G M^. 



x(^) = (x(n), . . . ,x(4)) 
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We will also use a special notation for accessing odd and even indices of a vector x G 



pN 



Xo = (x(l),x(3), . . . ,x(Af — 1)) for odd indices 
Xg = (x(2),x(4), . . . ,x(A/')) for even indices 



Finally we will use the following symbols for special vectors 

/ 1 \ 



and 



ei 



V 1 / 

/ 1 \ 





The dimensionality of the vectors e and ei will be clear from the context. 
3.3 Three main steps 

As we have mentioned before the method we propose is iterative and each iteration step^ 
consists of three substeps. 

3.3.1 First substep - Split 

Matrix A is splitted into matrices Aq (odd columns) and Ae (even columns) 



and 



V < ) 



V Xe 



3.3.2 Second substep - Update 

Having matrices Aq G R'><-^/2 and Ae G R'^-^/^ we create matrix C G M'^-^/^ 



C = - (Ao + A, 



This matrix will be called coarse approximation of matrix A. 
2. We also use a name decomposition level for iteration step. 
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3.3.3 Third substep - Predict 

In the last step we calculate discriminative wavelet coefficients. For each column k of matrix 
Ae [k = 1,2, ... , N/2) we create matrix A'^ G jg'xi'fe+i where G N is an even number 
and a parameter of the method.^ 

/ Xei(A;) -c^ 
A*^ = : : 

\ x^iik) -cf 

where c,^ = Cj(/fc) and Ik is a set of indices selected in the following way 

• If 1 < /c < ^ then Ik = {1,2,..., Lj,} 

• If^<A:<f-^ then h = {k - + 1, . . . , k + ^f} 

• Iff-^<A;<f then 4 = {f - + 1, . . . , f } 

At this point our method can be splitted into two variants: regularised and non- 
regularised. 

• regularised variant: This varian t uses PSVM approach t o find the optimal weight 
vector G M^''+^. According to Fung and MangasarianI (|20nih optimal w'^ is the 
solution of the following optimisation problem 

™"w^7„f4"^''"2 + 111 + y 116 Hi (2) 

subject to constraints 

Y(A^w^-7,e)+e' = e (3) 
where ^'^ is the error vector and v'^ > 0. 

• non-regularised variant: Similarly as in regularised variant the optimal weight 
vector G M^'^ is given by solving the following optimisation problem 

"i«"wfe,7fc,c*^l|w''||i + ^7fc + y llCfelli (4) 



subject to constraints 

1 
w 



^{^'{lk)-^ke]+e=e (5) 



where is the error vector and > 0. The only difference to the previous variant 
is that dimensionality of w'^ is instead of + 1 and Xei{k) is multiplied by one. 

In this variant we can also add some extra constraints such that in case of polynomial 
signals (up to some degree p^) we will get wavelet coefficients equal to zero. These 
constraints can be written in the following way 

B'=w'= = ei (6) 



3. In presented experiments we assumed that Lk = L for some constant L G 
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where ei G M^*^ and B'^ consists of the first pk rows of the Vandermonde matrix for 
some knots U , . . . , tj, ,, ■ Fo r more details on how to select knots we refer reader to 



(Clavpoole et al.Lll99Sl) and ()Ferna,ndez et alllft 

The additional constraints could be useful if analysed signals are superposition of poly- 
nomial and some other possibly interesting component. They imply that polynomial 
part of the analysed signal is eliminated and thus interesting component will play a 
bigger role in constructing discriminative wavelets coefficients. Also constructed base 
will have similar properties to the standard wavelet base. In the appendix the reader 
can find information on how to efficiently solve this extended optimisation problem. 
We have not used this variant in our experiments but present it for completeness 
reasons. 

Having optimal weight vector w'^ we can calculate vector d'^ G M' of discriminative 
wavelet coefficients using the following equations 



regularised variant 



d^(^) = (w^r^^^f )) z = l,2,...,/ 



• non-regularised variant 

d'=(i)=Xe,(fc)-(w^cf) i=l,2,...,l 

where (•, •) is a standard inner product. 
In a result we obtain a matrix D G 

D = ( di • • • d^/2 ) 

3.4 Iteration step 

The whole algorithm can be written in the following form 

• Let M be the number of iterations (decomposition levels). 

• Let Ao=A 

• For m = 1, . . . , M do 

7 JV_ N 

- Calculate Cm G M and G M'^z™ by applying three steps described in 
the previous section to the matrix A^-i- 

Set A.fYi = Cfn- 

The output of the algorithm will be a set of matrices Cm, Di, • • • , Dm- On the basis of 
these matrices we create the new training set 

X"^"- = |(x7^"',yi) gM^^^-^'+^> :i= 1,...,/} (7) 

where new examples are created by merging rows of matrices Ca/, Di, . . . , Dm- 
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3.5 Method summary 

We introduced the method that maps the set of signals X into a new set of signals X^' 
In the presented setting this map is a linear and invertible function / : — > 

/(x) = (cl„df,...,dl,) 

where 

CM e 

JV 

dM G 



JV 

d2 G 

JV 

di G 



are calculated by the method. With increasing m more and more samples from the original 
signal is used to calculate expansion coefficients. For example if we set = L for all k 
then to calculate vector d'^ L2™ samples of the original signal will be used. 
Here we present two most important features of the method 



Motivation for the method is that only a small part of the signals is important in 
classification process. The method tries to identify this important part adaptively. 



Exploiting natural parallelism (calculating is completelv independent for each k) 



and Sherman-Morrison- Woodbury formula IjGene H. Golnh . _1996) the method can 
be implemented very efficiently. In the appendix El we show how to properly solve 
optimisation problems that appears in our method. 

4. Applications 

This section contains description of possible applications of the proposed method. It is 
divided into two parts. In the first part we present an illustrative example of analysing 
artificial signals with the proposed method. In the second part we present the results for 
the real dataset. 

4.1 Artificial datasets 

Here we present results obtained on artificial datasets: Waveform and Shape. 
4.1.1 Dataset description 



Waveform is a three class artificia l data set ( Breima.nl . 199Sl ). For our experiments we used 



a slightly modified version (<Saitclll994h . Three classes of signals were generated using the 
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following formulas 

x^{i) = uhi{i) + {1 - u)h2{i) + e{i) class 1 (8) 

= uhi{i) + {1 - u)hs{i) + e(i) class 2 (9) 

x^{i) = uh2{i) + (1 - u)h3{i) + e{i) class 3 (10) 

(11) 

where i = 1, 2, . . . , 32, u is a uniform random variable on the interval (0, 1), e{i) is a standard 
normal variable and 



hi{i) = max{6 — \i — 7\,0) 
h2{i) = hi{i-8) 

4.1.2 Analysis 




-4 



5 10 15 20 25 30 35 

Figure 1: Examples from classes 1 and 2 



For simplicity reasons we decided to concentrate only on classes 1 and 2 presented in the 
Figure n For the purpose of this presentation we set parameters of our method as follows 

Lk = 4. 
i^k = I 
M = 3 

Figure El presents coarse approximations (the first two rows) and the test error ratio 
(the third row) ^ of calculated discriminative wavelet coefficients (evaluated on a separate 
test set). Each column present distinct decomposition level of our method. It is easily seen 

4. Test error ratio obtained using all samples was equal 0.10. 
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that coarse approximations are an averaged and a shortened versions of original signals. 
We believe that in some cases such averaging could be very useful especially when the 
analysed signals contains much noise. Prom the last row of the Figure El we can deduce 
that the classification ratio of some discriminative wavelet coefficients is comparable to 
the classification ratio obtained by applying PSVM method to the original dataset. We 
can point out explicitly the period of time in which two classes of signals differ most. 
This feature we called locality. Let us take a closer look at the 6th discriminative wavelet 
coefficient from the first decomposition level. To calculate this coefficient we need 8 out of 
32 samples of analysed signals (see first row of the Figure E} . 




Figure 2: Coarse approximations (two upper rows) and test error of discriminative wavelet 
coefficients (third row) for examples from classes 1 and 2. 



In the Figure |H1 one can see that base analysis vectors with the lowest error ratio have 
the supports shorter than their length. This means that to discern classes 1 and 2 we do not 
need all 32 samples but only a small fraction of them. Moreover when comparing Figures 
H and El it is clear that best analysis base vectors are nonzero where supports of functions 
hi and /13 intersect and this is the place where analysed signals indeed differ. 

The last Figure shows supports of analysis and synthesis base vectors. It is easily seen 
that support of a base vector widens with decomposition level. 

4.1.3 Extracting new features 

The method we presented can also be used as a supervised feature extractor. Instead of 
feeding classifier with original training set X we use X"^'^ defined in JJ} . Table ^ con- 
tains results of replacing original data with new features for classifying Waveform dataset 
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Figure 3: Best synthesis (left) and analysis (right) base vectors for each decomposition level 




Figure 4: Supports of analysis and synthesis discriminative base 
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and Shape dataset ( Saitol . [l99i ) with C4.5 classifier I Ian H. Wittenl . llQQflh . Prom this ta- 
ble we can derive that classification ratio increased considerably. We have also noticed a 
substantial decrease of decision tree complexity. As our method is designed for two-class 
problems and the used dat asets are three-class problems we used one- against- one scheme 
ijjen Lin and wei 



Dataset 


Misclass 

Original 


ification ratio 

New 


Waveform 


0.290 


0.186 


Shape 


0.081 


0.023 



Table 1: Effect of feature extraction for C4.5. Numbers are misclassification ratios. 



4.1.4 Ensemble of local classifiers 

The coefficients calculated by our method can also be used directly for classification. TablelS 
contains the test error ratios for Waveform and Shape datasets obtained by voting few best 
coefficients. As in the previous experiment we used one- against- one scheme for decomposing 
multi-class problems into three two-class problems. 



Dataset 


Miscla 

3 coefficients 


ssification rati 

15 coefficients 


o 

PSVM 


Waveform 


0.155 


0.147 


0.193 


Shape 


0.034 


0.032 


0.094 



Table 2: Misclassification ratios for voting scheme. We were combining 3 and 15 coefficients. 

The last column shows the misclassification ratio obtained using PSVM and all 
samples. 



4.1.5 Conclusions 

The presented method give both accurate 
problems. It can be very useful not only as 
mation about classified signals. In the next 
the results obtained on the real dataset. 



and comprehensible solution to classification 
a classifier inducer but also as source of infor- 
section we support our claims with presenting 



4.2 Classifying evoked potentials 

In this section we present the results obtained on the dataset collected in Nencki Institute 
of Experimental Biology of Polish Academy of Science. The dataset consists of sampled 
evoked potentials of rat's brain recorded in two different conditions. As a result the dataset 
consists of two groups of recordings (CONTROL and COND) that represent two different 
states of the rat's brain. The aim of the experiment was to expla in the differenc e s betw een 
the two groups. We refer the reader to iKublik et all (j200ll ) and Wvpvch et al. lj200.'j ) for 
more details and previous approaches to the data. 
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It should be mentioned that the problem is not a typical classification task. This is due 
to the following reasons 

• Each example (evoked potential) is labelled with an unknown noise. It means that 
there are examples that are possibly incorrectly labelled. 

• The problem is ill-conditioned due to a small number of examples (45-100) and a huge 
dimension (1500 samples). 

• The biologists that collected the data were interested not only in a good classification 
ratio but also in explanation of differences in the two groups. 





Figure 5: Averaged evoked potentials for five rats. Red colour denotes COND and blue de- 
notes CONTROL. Only first most informative 45ms (450 samples) are presented. 



Figure ini presents averaged potentials from two classes for group of five rats. We show 
only the first 45ms because differences in this period of time can be easily interpreted by 
biologists. 

After applying our method to evoked potentials for each rat we have chosen those local 
classifiers whose classification accuracy was greater or equal 0.75 and it was statistically 



significant at the level 0.1 with respect to permutation tests IjWvpvch et al.l . I200,'j ) . The 
result of this selection is depicted in the Figure El It is clear that the most interesting parts 
of the signals are 2.9-4ms and 11. 7-12. 8ms. Figure d shows how each potential is classified 
by selected local classifiers. It should be read in the following manner 

• Vertical line divides potentials into two groups CONTROL (on the left) and COND 
(on the right). 
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Axis Y shows how selected classifiers agreed on classifying potential. 



• The potentials were grouped (red and blue) depending on how they were classified. 
Those marked with green colour could not be classified. 

• We claim that those groups shows two different states of the rat's brain. 

Th e presented method ga ve very similar re sults to the previous approaches ( Kublik et all 



20011 ). ()Wvpvch et al.l . l2003l ) and (Smolinski et al.l . l2002' l . Thanks to locality feature of our 



method we were able not only to classify potentials but also to point out the most informa- 
tive part of the signals. For det ailed physiological interpretation of the results we refer the 



reader to l.Takuczun et al.l l)2005l ) . 
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Figure 6: Histograms showing which parts of analysed signals are commonly indicated for 
all rats. The picture shows first four levels of decomposition of our method. 



5. Conclusions 

In this article we presented a new method for classifying signals. The method is iterative 
and adapts to local structures of analysed signals. If carefully implemented it can be very 
efficient and when used by an experienced researcher can be a very powerful tool for signals 
discriminative analysis. There are many possible extensions to our method but the most 
interesting seem to be the following 

• Modification of the method to handle two dimensional signals such us images. 

• Applying kernel trick in constructing local classifiers. That would lead to nonlinear 
classifiers and possibly better accuracy. 
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Figure 7: Charts presenting how particular potential was classified by selected local classi- 
fiers. Vertical line divides potentials into two groups (CONTROL is on the left, 
COND is on the right). 



Constructing classifiers using Multi Kernel Learning approaches l)Bach et al. 



Appendix A. Efficiently solving optimisation problem for non-regularised 
and regularised version 

Here we explain how to efficiently solve optimisation problem defined by |0J, ©, ©• Let 
us write Lagrangian for the optimisation problem 



L(w^7fc,e^u^v'^^ 



|„,fc||2 , ^,2^ I llcfc||2 I 

|w II2 + 7fc) + ^114 II2 + 



i^r ( Y ( AM ' 



w" 



7fce + r - e 



where u'^ E M' is the Lagrange multiplier associate with the equality constraint (EJ and 
v'^ G RPfe is the Lagrange multiplier associated with the equality constraint 
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Settings the gradients of L to zero we get the following optimality conditions 

: k 



W 



Ik 



(A )^ Yu'^ - (B'=)^ v'' 

— e 
1 



Y ( A'^ + A^^ - 7fce 1 + C 



B'^w'^ = ei 



(12) 
(13) 

(14) 

(15) 
(16) 



, - ~k 

where A'^ = A'^A 



Substituting ^ into ^ we get 



B'^ A Yu^' - e 



(17) 



Substituting JHl), CHl and JTZl into JEl) we get 



:^k / ~k 



k / ,\T 



Y < A A Yu''' - A B 



g/c -Qk 



-1 / 

k X \ v„fc 



Simplifying (fTSjl we get 



BM A 1 Yu'^ - ej j+^u« = e-YA'' 

(18) 



~k / ~k 



Y< A A - A B 



T 



gfc ^-Qky 



-1 



B'^ ( A " ) \ Yu'' + 



= k 



e-YA'^-A (B'=) B'^(B'^'' 



-1 



(19) 



Let matrix be defined as 



Hf = Y 



A\-A (b'^Y (c''^^ 



(20) 



and matrix H§ be defined as 



H^ = Y 



A^A^ (b''Y (c''^^ 



(21) 



where 



Rewriting equation ifTI^ we obtain that 



gfc -^k 



T 



—I + Hi (Hs)^ I u'^ = e - YA'^ - aV B*-'' ^ 
J^k J ^ 



(22) 
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: k 



Setting vector = e - YA''' - A (B'^)"^ B'^ (B'^)'' e we get that vector u'^ is given 
by the following set of equations 

—I + Hi (Ha)^^ u'^ = b'^ (23) 
i^k J 

Solving above set of equations is very expensive as the number of equations is equal to 
number of training exarnples I which can be large. Using the Sherman-Morrison- Woodbury 
formula I Gene H. GohibL 199(il ) we can calculate as follows 



-1 



u'^ = z/fc (^I - Hi (^ll + (Hi)^ H2 ) (H2y 1 b^ (24) 

It should be stressed that using equation for computing is much less expensive than 
using equation because the dimensions of matrix 

-I + (Hi)^H2 
i^k 

are equal to L^+p^ x +Pk which is independent of the number of training of examples. 

Similarly to nonregularised variant presented above we can use the same techniques 
to so lve optimisation problem JSJ and JSJ. For more details see I Fung and Mangasarianl . 
12001 h . 
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